Interface virtualization and fast path for network on chip

ABSTRACT

Example implementations described herein are directed to a configurable Network on Chip (NoC) element that can be configured with a bypass that permits messages to pass through the NoC without entering the queue or arbitration. The configurable NoC element can also be configured to provide a protocol alongside the valid-ready protocol to facilitate valid-ready functionality across virtual channels.

CROSS-REFERENCE TO RELATED APPLICATION

This regular U.S. patent application is a continuation of U.S. patentapplication Ser. No. 15/829,749, filed on Dec. 1, 2017 which is based onand claims the benefit of priority under 35 U.S.C. 119 from provisionalU.S. patent application No. 62/429,695, filed on Dec. 2, 2016, theentire disclosure of which is incorporated by reference herein.

BACKGROUND Technical Field

Methods and example implementations described herein are directed tointerconnect architecture, and more specifically, to Network on Chip(NoC) architectures and the design and management thereof.

Related Art

The number of components on a chip is rapidly growing due to increasinglevels of integration, system complexity and shrinking transistorgeometry. Complex System-on-Chips (SoCs) may involve a variety ofcomponents e.g., processor cores, Digital Signal Processors (DSPs),hardware accelerators, memory and I/O, while Chip Multi-Processors(CMPs) may involve a large number of homogenous processor cores, memoryand I/O subsystems. In both SoC and CMP systems, the on-chipinterconnect plays a role in providing high-performance communicationbetween the various components. Due to scalability limitations oftraditional buses and crossbar based interconnects, Network-on-Chip(NoC) has emerged as a paradigm to interconnect a large number ofcomponents on the chip. NoC is a global shared communicationinfrastructure made up of several routing nodes interconnected with eachother using point-to- point physical links.

Messages are injected by the source and are routed from the source nodeto the destination over multiple intermediate nodes and physical links.The destination node then ejects the message and provides the message tothe destination. For the remainder of this application, the terms‘components’, ‘blocks’, ‘hosts’ or ‘cores’ will be used interchangeablyto refer to the various system components which are interconnected usinga NoC. Terms ‘routers’ and ‘nodes’ will also be used interchangeably.Without loss of generalization, the system with multiple interconnectedcomponents will itself be referred to as a ‘multi-core system’.

There are several topologies in which the routers can connect to oneanother to create the system network. Bi-directional rings (as shown inFIG. 1(a)), 2D (two dimensional) mesh (as shown in FIGS. 1(b)) and 2-DTaurus (as shown in FIG. 1(c)) are examples of topologies in the relatedart. Mesh and Taurus can also be extended to 2.5-D (two and halfdimensional) or 3-D (three dimensional) organizations. FIG. 1(d) shows a3D mesh NoC, where there are three layers of 3×3 2D mesh NoC shown overeach other. The NoC routers have up to two additional ports, oneconnecting to a router in the higher layer, and another connecting to arouter in the lower layer. Router 111 in the middle layer of the examplehas both ports used, one connecting to the router at the top layer andanother connecting to the router at the bottom layer. Routers 110 and112 are at the bottom and top mesh layers respectively, therefore theyhave only the upper facing port 113 and the lower facing port 114respectively connected.

Packets are message transport units for intercommunication betweenvarious components. Routing involves identifying a path composed of aset of routers and physical links of the network over which packets aresent from a source to a destination. Components are connected to one ormultiple ports of one or multiple routers; with each such port having aunique ID. Packets carry route information such as the destination'srouter and port ID for use by the intermediate routers to route thepacket to the destination component.

Examples of routing techniques include deterministic routing, whichinvolves choosing the same path from A to B for every packet. This formof routing is independent from the state of the network and does notload balance across path diversities, which might exist in theunderlying network. However, such deterministic routing may implementedin hardware, maintains packet ordering and may be rendered free ofnetwork level deadlocks. Shortest path routing may minimize the latencyas such routing reduces the number of hops from the source to thedestination. For this reason, the shortest path may also be the lowestpower path for communication between the two components. Dimension-orderrouting is a form of deterministic shortest path routing in 2D, 2.5-D,and 3-D mesh networks. In this routing scheme, messages are routed alongeach coordinates in a particular sequence until the message reaches thefinal destination. For example in a 3-D mesh network, one may firstroute along the X dimension until it reaches a router whose X-coordinateis equal to the X-coordinate of the destination router. Next, themessage takes a turn and is routed in along Y dimension and finallytakes another turn and moves along the Z dimension until the messagereaches the final destination router. Dimension ordered routing may beminimal turn and shortest path routing.

FIG. 2(a) pictorially illustrates an example of XY routing in a twodimensional mesh. More specifically, FIG. 2(a) illustrates XY routingfrom node ‘34’ to node ‘00’. In the example of FIG. 2(a), each componentis connected to only one port of one router. A packet is first routedover the x-axis till the packet reaches node ‘04’ where the x-coordinateof the node is the same as the x-coordinate of the destination node. Thepacket is next routed over the y-axis until the packet reaches thedestination node.

In heterogeneous mesh topology in which one or more routers or one ormore links are absent, dimension order routing may not be feasiblebetween certain source and destination nodes, and alternative paths mayhave to be taken. The alternative paths may not be shortest or minimumturn.

Source routing and routing using tables are other routing options usedin NoC. Adaptive routing can dynamically change the path taken betweentwo points on the network based on the state of the network. This formof routing may be complex to analyze and implement.

A NoC interconnect may contain multiple physical networks. Over eachphysical network, there may exist multiple virtual networks, whereindifferent message types are transmitted over different virtual networks.In this case, at each physical link or channel, there are multiplevirtual channels; each virtual channel may have dedicated buffers atboth end points. In any given clock cycle, only one virtual channel cantransmit data on the physical channel.

The physical channels are shared into a number of independent logicalchannels called virtual channels (VCs). VCs provide multiple independentpaths to route packets, however they are time-multiplexed on thephysical channels. A virtual channel holds the state needed tocoordinate the handling of the flits of a packet over a channel. At aminimum, this state identifies the output channel of the current nodefor the next hop of the route and the state of the virtual channel(idle, waiting for resources, or active). The virtual channel may alsoinclude pointers to the flits of the packet that are buffered on thecurrent node and the number of flit buffers available on the next node.

NoC interconnects may employ wormhole routing, wherein, a large messageor packet is broken into small pieces known as flits (also referred toas flow control digits). The first flit is the header flit, which holdsinformation about this packet's route and key message level info alongwith payload data and sets up the routing behavior for all subsequentflits associated with the message. Optionally, one or more body flitsfollows the head flit, containing the remaining payload of data. Thefinal flit is the tail flit, which in addition to containing the lastpayload also performs some bookkeeping to close the connection for themessage. In wormhole flow control, virtual channels are oftenimplemented.

The term “wormhole” plays on the way messages are transmitted over thechannels: the output port at the next router can be so short thatreceived data can be translated in the head flit before the full messagearrives, thereby facilitating the sending of the packet to the nextrouter before the packet is fully received. This allows the router toquickly set up the route upon arrival of the head flit and then opt outfrom the rest of the conversation. Since a message is transmitted flitby flit, the message may occupy several flit buffers along its path atdifferent routers so that the packet can exist in multiple routers,thereby creating a worm-like image.

Based upon the traffic between various end points, and the routes andphysical networks that are used for various messages, different physicalchannels of the NoC interconnect may experience different levels of loadand congestion. The capacity of various physical channels of a NoCinterconnect is determined by the width of the channel (number ofphysical wires) and the clock frequency at which it is operating.Various channels of the NoC may operate at different clock frequencies,and various channels may have different widths based on the bandwidthrequirement at the channel. The bandwidth requirement at a channel isdetermined by the flows that traverse over the channel and theirbandwidth values. Flows traversing over various NoC channels areaffected by the routes taken by various flows. In a mesh or Taurus NoC,there may exist multiple route paths of equal length or number of hopsbetween any pair of source and destination nodes. For example, in FIG.2(b), in addition to the standard XY route between nodes 34 and 00,there are additional routes available, such as YX route 203 or amulti-turn route 202 that makes more than one turn from source todestination.

In a NoC with statically allocated routes for various traffic flows, theload at various channels may be controlled by intelligently selectingthe routes for various flows. When a large number of traffic flows andsubstantial path diversity is present, routes can be chosen such thatthe load on all NoC channels is balanced nearly uniformly, thus avoidinga single point of bottleneck. Once routed, the NoC channel widths can bedetermined based on the bandwidth demands of flows on the channels.Unfortunately, channel widths cannot be arbitrarily large due tophysical hardware design restrictions, such as timing or wiringcongestion. There may be a limit on the maximum channel width, therebyputting a limit on the maximum bandwidth of any single NoC channel.

Additionally, wider physical channels may not help in achieving higherbandwidth if messages are short. For example, if a packet is a singleflit packet with a 64-bit width, then no matter how wide a channel is,the channel will only be able to carry 64 bits per cycle of data if allpackets over the channel are similar. Thus, a channel width is alsolimited by the message size in the NoC. Due to these limitations on themaximum NoC channel width, a channel may not have enough bandwidth inspite of balancing the routes.

To address the above bandwidth concern, multiple parallel physical NoCsmay be used. Each NoC may be called a layer, thus creating a multi-layerNoC architecture. Hosts inject a message on a NoC layer; the message isthen routed to the destination on the NoC layer, where it is deliveredfrom the NoC layer to the host. Thus, each layer operates more or lessindependently from each other, and interactions between layers may onlyoccur during the injection and ejection times. FIG. 3(a) illustrates atwo layer NoC. Here the two NoC layers are shown adjacent to each otheron the left and right, with the hosts connected to the NoC replicated inboth left and right diagrams. A host is connected to two routers in thisexample—a router in the first layer shown as R1, and a router is thesecond layer shown as R2. In this example, the multi-layer NoC isdifferent from the 3D NoC, i.e. multiple layers are on a single silicondie and are used to meet the high bandwidth demands of the communicationbetween hosts on the same silicon die. Messages do not go from one layerto another. For purposes of clarity, the present disclosure will utilizesuch a horizontal left and right illustration for multi-layer NoC todifferentiate from the 3D NoCs, which are illustrated by drawing theNoCs vertically over each other.

In FIG. 3(b), a host connected to a router from each layer, R1 and R2respectively, is illustrated. Each router is connected to other routersin its layer using directional ports 301, and is connected to the hostusing injection and ejection ports 302. A bridge-logic 303 may sitbetween the host and the two NoC layers to determine the NoC layer foran outgoing message and sends the message from host to the NoC layer,and also perform the arbitration and multiplexing between incomingmessages from the two NoC layers and delivers them to the host.

In a multi-layer NoC, the number of layers needed may depend upon anumber of factors such as the aggregate bandwidth requirement of alltraffic flows in the system, the routes that are used by various flows,message size distribution, maximum channel width, etc. Once the numberof NoC layers in NoC interconnect is determined in a design, differentmessages and traffic flows may be routed over different NoC layers.Additionally, one may design NoC interconnects such that differentlayers have different topologies in number of routers, channels andconnectivity. The channels in different layers may have different widthsbased on the flows that traverse over the channel and their bandwidthrequirements.

In a NoC interconnect, if the traffic profile is not uniform and thereis a certain amount of heterogeneity (e.g., certain hosts talking toeach other more frequently than the others), the interconnectperformance may depend on the NoC topology and where various hosts areplaced in the topology with respect to each other and to what routersthey are connected to. For example, if two hosts talk to each otherfrequently and require higher bandwidth than other interconnects, thenthey should be placed next to each other. This will reduce the latencyfor this communication which thereby reduces the global average latency,as well as reduce the number of router nodes and links over which thehigher bandwidth of this communication must be provisioned.

A NoC uses a shared network to pass traffic between differentcomponents. Any particular traffic flow might cross multiple routersbefore arriving at its destination. While the NoC can be efficient interms of sharing wires, there can be an adverse effect on latency. Eachrouter needs to arbitrate between its various inputs ports to decidewhich packet will be sent in a cycle. After the arbitration, the datamust be selected through a multiplexing (muxing) structure. This processcan take one or more cycles to complete, depending on themicroarchitecture of the routers and the frequency. This means that foreach router a traffic flow must cross, it can be incurring additionalcycles of delay. Wire delay between routers can also cause delay.

To reduce latency, the routers can be built with bypass paths that allowskipping some or all of the arbitration and muxing costs of a router.These bypass paths can be used opportunistically when the router isidle, or they can support a simpler arbitration that allows asignificant decrease in cycle time loss. Intelligent use of bypasses ina system can improve average latency of requests.

Longer latency can hurt the performance of the system. Reducing thelatency of traffic flows is an important goal. The benefit of lowerlatency vary between different traffic flows. Some components are verylatency sensitive, where each additional cycle of latency can have asignificant performance reduction. Other flows will be less sensitive tolatency. Intelligent setup of the bypasses can select the traffic flowsthat will provide the largest overall benefit to the system performance.

When packets finish traversing a NoC, they arrive at the interface to acomponent. Because a NoC can have many different kinds of traffic,design of the interface can have a big impact on performance. Manyinterface protocols use a method of flow control that doesn'tdistinguish between the contents of the packets. This can createhead-of-line blocking issues, where a more important packet is stuckbehind a less important packet.

The destination component can often benefit from distinguishing betweendifferent incoming traffic flows, allowing it to accept the moreimportant flows and hold off the less important flows when resources arescares. Support of an enhanced interface can allow the destinationcomponent to signal the network which traffic flows it is willing toaccept. The network can then choose which packets to send, avoiding thehead-of-line blocking issue.

The enhanced interface flow control can be coupled with the networks useof virtual or physical channels to further avoid head-of-line blocking.If lower priority packets are transported in a separate channel from thehigher priority packets, the destination component can backpressure onechannel and allow the other to continue unimpeded.

SUMMARY

Therefore, to address the aforementioned problems, there is a need forsystems, methods, and non-transitory computer readable mediums tofacilitate an opportunistic bypass system for a NoC, as well as a VCvalid and credit system to facilitate the management of VCs of the NoC.

Aspects of the present disclosure involve a Network on Chip (NoC) havinga plurality of channels and a valid-ready system with VC valid and VCcredit going back, element configured to send a valid signal with a VCvalid signal.

Aspects of the present disclosure further involve a network on chip(NoC) element involving a plurality of physical links and virtual links,and a configurable bypass between virtual links, and bypass logicconfigured to bypass the queue and the logic of the NoC element.

The bypass is configured to bypass the queue and the logic of the NoCelement in an opportunistic manner in accordance with the desiredimplementation. The NoC can also involve a configurable router that hascomplete configurability in terms of which bypasses are available. Theconfigurable router has output ports, in which any select input port canconnect to an output port with a direct bypass.

Aspects of the present disclosure can further include methods andcomputer readable mediums directed to determining the selection ofbypasses for NoC construction. Such methods and computer readablemediums can include algorithms that during NoC construction, createadditional opportunities for bypassing. Such algorithms can includerestrictions to bypass placement (e.g., connections requiring upsizingand downsizing do not have bypass) reshaping the NoC topology to createmore links for the bypass, building the NoC to have equal number ofports with no clock crossing, and avoiding upsizing and downsizinglinks.

In example implementations, the algorithms for the creation of bypasspaths can involve determining the possible bypass opportunities for theconfigurations based on restrictions, for each bypass opportunity,choosing which inputs go to the output based on calculation of expectedtraffic flows/bandwidth that are expected to have biggest impact on thespecification (e.g., weighted average of traffic, also take latency andimportance of traffic into consideration), and selecting the bypasseswith the biggest benefit.

In example implementations, there can be algorithms such as amultiplexer selection algorithm to select which multiplexer to use(e.g., preselected versus post selected), opportunistic bypassprocessing (e.g., messages are sent through bypass if bypass is idle orif bypass is possible, bypass conducted based on latency and First InFirst Out (FIFO) depth).

In example implementations, there can be NoC elements and configurationmethods wherein a single input port could be selected for use as abypass to multiple output port subject to restrictions (e.g., output VCmust be the same size as the input, different physical link sizesinvolve bypass links with matching VCs).

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1(a), 1(b), 1(c) and 1(d) illustrate examples of Bidirectionalring, 2D Mesh, 2D Taurus, and 3D Mesh NoC Topologies.

FIG. 2(a) illustrates an example of XY routing in a related art twodimensional mesh.

FIG. 2(b) illustrates three different routes between a source anddestination nodes.

FIG. 3(a) illustrates an example of a related art two layer NoCinterconnect.

FIG. 3(b) illustrates the related art bridge logic between host andmultiple NoC layers.

FIG. 4 illustrates an example of a router, in accordance with an exampleimplementation.

FIG. 5 illustrates an example flow diagram for configuring routersduring configuration time, in accordance with an example implementation.

FIG. 6 illustrates a valid-ready architecture in accordance with anexample implementation.

FIG. 7(a) illustrates an example system having a SoC element (master), aSoC element (slave), a NoC bridge and a NoC, in accordance with anexample implementation. In the example implementation, the NoC bridgesand the NoC elements have four input VCs and four output VCs. A singlephysical wire proceeds from the SoC element to the bridge, whereupon thesignal is fanned out to each NoC element in four output VCs.

FIG. 7(b) illustrates an example architecture for a NoC element, inaccordance with an example implementation.

FIG. 8 illustrates an example table view of information utilized by theNoC element, in accordance with an example implementation.

FIG. 9 illustrates a flow diagram for a requesting NoC element, inaccordance with an example implementation.

DETAILED DESCRIPTION

The following detailed description provides further details of thefigures and example implementations of the present application.Reference numerals and descriptions of redundant elements betweenfigures are omitted for clarity. Terms used throughout the descriptionare provided as examples and are not intended to be limiting. Forexample, the use of the term “automatic” may involve fully automatic orsemi-automatic implementations involving user or administrator controlover certain aspects of the implementation, depending on the desiredimplementation of one of ordinary skill in the art practicingimplementations of the present application.

In example implementations, a NoC interconnect is generated from aspecification by utilizing design tools. The specification can containconstraints such as bandwidth/Quality of Service (QoS)/latencyattributes that is to be met by the NoC, and can be in various softwareformats depending on the design tools utilized. Once the NoC isgenerated through the use of design tools on the specification to meetthe specification requirements, the physical architecture can beimplemented either by manufacturing a chip layout to facilitate the NoCor by generation of a register transfer level (RTL) for execution on achip to emulate the generated NoC, depending on the desiredimplementation.

In a NoC, there is a network having routers and bridges. Other elementsmay also be present which can make the NoC fairly large. There may be aninherent latency problem with the NoC. In example implementations,bridges require activation for send messages into the network, and whenmessages are sent through the link, the router has to arbitrate themessages before the message is sent to the next hop.

For each hop running at a slow frequency, an entire router arbitrationcalculation including the travel time can be determined. However, mostrelated art implementations are executed at a high frequency, wherein insuch cases that the router arbitration may be conducted in a singlecycle. Further, latency can be incurred in the bridge, with a cycleincurred in the bridge, a cycle for the link, a cycle for the router,and so on for the transaction. Latency reduction can be difficult due tothe routers having arbitration requirements which incur a latency lossfor arbitration in each router.

FIG. 4 illustrates an example of a router, in accordance with an exampleimplementation. In example implementations for reducing latency in therouter, routers implement a fast path, which functions as a bypasshaving bypass logic 406. A router may have an assortment of inputs whichare processed by elements such as a decoder 401, a queue such as a Firstin First Out (FIFO) queue 402, an arbiter 403 and a multiplexer 404(mux) for conducting arbitration and determining the output 405. Inexample implementations alongside the multiplexer 404, the router has apath configured to function as a special bypass with bypass logic 406.One or more inputs can be designated for the special bypass, such thatthe input entering one of the muxes will be able to hop in at the end ofa cycle. If there is an output, the output can be placed in at the endof a cycle so that the input into the router will be able to go directlyto the output instead of going through the arbitration. In such anexample implementation, one cycle of latency can thereby be removed perrouter by reducing the processing to decode, bypass logic (e.g.validation) and output. Routing information can be included in directwires to the router in accordance with the desired implementation.Further, once latency is reduced, the potential round trip latency isdecreased as other messages may be able to pop off the FIFO morequickly. Once the bypasses are configured for each eligible router, theexample implementations could then calculate the cycle of depth based onthis latency. Example implementations of a NoC contains hardware or NoCelements that involve a plurality of physical links and virtual links,with a configurable bypass between virtual links, and bypass logic 406configured to bypass the queue and the logic of the NoC element. Thebypass logic 406 can be configured to initiate bypass of the message inan opportunistic manner (e.g., depending on whether queue is free ornot, etc.)

In example implementations, messages destined to bypass can bepre-arbitrated and then the only logic in the hop can be for determiningwhich output channel is used for the bypass as determined by the bypasslogic as illustrated in FIG. 4. In example implementations, multipleoutputs can be used for bypass for an input. For example, one input canbypass to one of multiple output ports, with each output associated withonly one input. Bypass logic may also be utilized for optimizingmessages in accordance with an example implementation. For example, if aqueue is empty the message is sent through the logic for the bypass. Ifno other message takes priority then the message is transmitted throughthe bypass path to avoid all logic. Such example implementations cantherefore be configured to conduct more than simply bypassing the FIFOqueue and entering arbitration, but can be utilized to bypass all routerlogic and go directly to the output. In example implementations, thebypass can be conducted when there is no other traffic going on thelink, which indicates no cost to arbitration as determined by the bypasslogic.

In the following example implementations, requirements may be set forforwarding an input to the special bypass. One example requirement isthat the link sizes are matched so latency from a width conversion isremoved. Another example requirement is no clock crossing, so latencyfrom clock conversion is also removed. Other requirements may also beset in according to the desired implementation.

Related art implementations implement a bypass path in a fixed positionthat is affixed to an input that is considered to be the most commonbypass user. One example of a related art implementation is that aninput destined for a particular direction will continually proceed inthe direction (e.g. a south input port bypasses to the north inputport). Such related art solutions are static.

In example implementations as illustrated in FIG. 4, there can be a NoChardware element which can involve a plurality of physical channels andvirtual channels, and a configurable bypass between virtual links,whereupon bypass logic can be configured to bypass the queue and thelogic of the NoC element in an opportunistic manner. The bypass logiccan allow messages to be transmitted through the bypassopportunistically based on whether the input First in First Out (FIFO)queue is empty or not, based on the priority of the traffic beingarbitrated, whether the bypass is idle/available or not, queue depth ofthe transmitting hardware element, and so on depending on the desiredimplementation.

In example implementations, the bypass configuration can be made duringconfiguration time for the specification. FIG. 5 illustrates an exampleflow diagram for configuring routers during configuration time, inaccordance with an example implementation. At 501, the specification isprocessed for traffic flows. During configuration time, the exampleimplementations determine all of the traffic flows from thespecification, wherein routers that are eligible for bypass areidentified at 502. In an example implementation, if all the trafficflows can be considered during configuration time, traffic tendenciescan be identified for a router (e.g. most traffic for an identifiedrouter proceeds from the west port to the north port). In the aboveexample, a bypass can be constructed from the west port to the northport to reduce latency. Other implementations based on the traffic flowfor identifying routers are also possible depending on the desiredimplementation. For example, latency sensitivity of traffic flows canalso be recognized. In this manner, example implementations can beconfigured to determine the bypass not only by the most amount oftraffic going through a port, the bypass can be determined based ondetermining the importance of the traffic. Traffic flows can beassociated with a weight in terms of the importance of the latency, e.g.how latency sensitive is the traffic, which can be taken into accountfor identifying eligible routers. Example implementations can calculatethe latency sensitivity based on the weights. For example, latencysensitive traffic can be multiplied by the weight to prioritize latencysensitive traffic over raw latency for a channel, depending on thedesired implementation.

Example implementations can also analyze traffic flows so that an arrayis created based on the input ports (e.g. A, B, C, D, E, and F), andanalyze how much of the traffic is coming in on a given link is going toa given output port. So for a given output port, analysis can beconducted by comparing the input ports and constructing a bypass basedon the bandwidth consumed by the input ports to a given output port. Forexample, for a router wherein input port one is responsible for threegigabytes of output for output port X for a given time frame and inputport two is responsible for six gigabytes for the given time frame, abypass can be utilized between input port two and output port one.

At 503, locations for implementing a bypass are identified. Thelocations for implementing the bypass can be identified based on thetraffic flow determinations, the hardware configuration of the routerand by other methods according to the desired implementation. Forexample, simulations can be conducted to detect where latency asaffected by wire length and travel length are taken into consideration.In such example implementations, output ports can be configured so thata bypass can be made available within the router. And so by convertingthe router with additional output ports, latency can be reduced. Thus,in example implementations, the optimization can involve determiningwhich bypasses can be implemented to reduce latency and the location ofsuch bypass. The optimization can involve a pre-optimizationimplementation where conditions for bypassing are identified, andbypasses can be implemented therein. By using design tools during theconfiguration time, path input algorithms can be utilized to determinethe shortest path for the bypass for use in determining the location forimplementing the bypass. Optimizations for placement of network elementscan also be made to create additional opportunities for bypass inaccordance with the desired implementation.

Bypasses may also be determined based on desired constraints. In anexample constraint, the input VC width is set to match the output VCwidth. In such an example implementation, the physical link size may bedifferent, however, the bypass is still utilized between the twophysical links to connect matching input and output VCs.

At 504, the eligible routers are then configured with the bypass basedon the determinations. As the routers are configurable in exampleimplementations, a heterogeneous NoC with heterogeneous routers canthereby be implemented. Example implementations are in contrast torelated art systems, which are directed to homogenous NoC systems andhomogenous routers. Related art implementations involve bypasses thatare stacked directionally on the assumption that the NoC is homogenousand is therefore static, whereas the example implementations of thepresent disclosure can utilize heterogeneous router and NoCconfigurations.

Example implementations described herein can be implemented as ahardwired bypass. In such example implementations, the software atconfiguration time can precompute where packets are going and can alsoutilize sideband information to the NoC. Sideband channels can beutilized for messages to determine which output port to utilize.Sideband information does not need to be utilized for controllingmultiplexing to the output ports, but can be utilized control thevalidity of the output port. The routing information is processed,wherein example implementations calculate the route including the port.

As illustrated in FIG. 5, example implementations can also involvemethods and computer readable mediums with instructions directed todetermining the selection of bypasses for NoC construction. Such exampleimplementations can involve algorithms that during NoC construction,create additional opportunities for bypassing. The opportunities caninvolve the reshaping of NoC topology to create more channels that areeligible for bypass (e.g., building a NoC with routers having equalnumbers of ports without any clock crossing), applying restrictions tobypass to avoid channels or virtual channels that conduct upsizing anddownsizing, and so on depending on the desired implementation.

Example implementations can also involve algorithms for the creation ofbypass paths. As illustrated in FIG. 5, such algorithms determine all ofthe possible bypass opportunities for the configurations based on therestrictions as described above. For each possible bypass, the algorithmcan then determine which inputs go to which output based on thecalculation of expected traffic flows/bandwidth. Such exampleimplementations will determine which bypass provides the biggest impacton the NoC specification (weighted average of traffic, also take latencyand importance of traffic into consideration), whereupon the algorithmcan thereby choose bypasses with the biggest benefit above a desiredthreshold.

Example implementations may also involve algorithms for selecting whichmultiplexer to incorporate into the NoC hardware element, which can beconducted in a preselected manner or configured after the NoC isdesigned, in accordance with the desired implementation.

Example implementations may also involve NoCs with hardware elementshaving differing physical channel sizes, but VCs with matching sizes tofacilitate the bypass. The hardware elements may also be in the form ofa configurable router that has complete configurability in terms ofwhich bypasses are available. In an example implementation, the routerdesign can involve having each output port associated with a selectedinput port with a direct bypass. Further, example implementations mayinvolve a NoC element and configuration method wherein a single inputport could be selected from bypass to multiple with restrictions. (e.g.,if the output VC is the same size as the input.)

Virtualization Interface and Valid-Ready for Virtual Channels (VCs) andOther Types of Traffic

In related art implementations, NoC systems utilize a valid/readyhandshake. In such a handshake protocol, one NoC element asserts a validsignal, and if the receiving NoC element asserts a ready signal at thesame time, then a message transfer can occur between the two NoCelements. Such related art implementations may further have restrictionsdepending on the implementation (e.g. to prevent deadlock). In anexample restriction, the NoC element does not wait for the valid signalto assert a ready signal, or vice versa. However, related artimplementations of the valid/ready handshake are not aware of the actualstatus of VCs. In related art implementations, even if a request is madeusing the valid/ready handshake, the status of the VC to be used mayactually be blocked. Further, other VCs within the physical channel maybe available, but the related art implementations cannot discern theiravailability due to the NoC elements requiring a ready signal beforeproceeding. Such implementations may also apply to other traffic typeswhere the valid/ready handshake is blocking the transmission. Thedestination element would benefit from being able to indicate whichtraffic flows it would like to receive through the issuance of creditsor indication through the ready signal for that specific traffic type.

In example implementations, additional information is provided for avalid-ready handshake to address the issues with the related art.Example implementations utilize a valid-ready and credit based hybridsystem to facilitate valid-ready handshake functionality. In acredit-based design for the example implementations, independent creditsare allocated for each VC. The requesting NoC element transmits arequest when a VC credit has been obtained.

Related art implementations utilize a sideband information channel toindicate which virtual channels are available. However, such informationis potentially stale. Further, such implementations provide a bit vectorthat indicates VCs within a range are available (e.g. VCs 8-16) withoutspecifically indicating which VCs are available and which are not.

FIG. 6 illustrates a valid-ready architecture in accordance with anexample implementation. In example implementations, a hybrid approachinvolving a credit base system is utilized, which facilitates abi-directional communication. For a NoC requesting element 601 and a NoCtarget element 602, there is a valid-ready handshake as well as anothervector for VC valid and VC credit in the sideband. The VC validinformation is provided to the NoC requesting element 601, so that theNoC requesting elements makes the request if a specific resourcededicated to the request is available. Such example implementationsprovide flexibility as the number of virtual channels can be any numberin accordance with a desired implementation.

In example implementations, a number of VCs on the NoC are associatedwith a physical interface. The physical interface can be associated witha number of interface VCs which can be mapped according to the desiredimplementation.

In an example implementation involving a master and slave, a NoC bridgeis utilized. The NoC bridge communicates with a slave, which may have aplurality of virtual channels for the traffic. One virtual channel mayinvolve high-priority CPU traffic (e.g. latency-sensitive traffic),another may involve I/O traffic, and another may involve asynchronoustraffic which may be time critical, and so on. The properties of thevirtual channels may also change over time, depending on the desiredimplementation.

In example implementations involving credit based implementation, aseach channel can be separated and dedicated to the desiredimplementation, such implementations avoid the merger of traffic flowsthat should not be merged.

In the example implementation hybrid approach, the credit-basedhandshake is conducted between the agents while valid-ready requirementsare enforced. In an example implementation, the target sends a creditback to the master indicating that a resource is available for arequest. When the master tries to make that request, the target canindicate that it is not ready due to some delay (e.g. clock crossing).By utilizing the valid-ready with the credit system, it provides a wayfor temporary back-pressuring from the slave.

In example implementations, initialization is also facilitated as whenthe credit-based approach is applied, the NoC elements will determinethe initialization. For example, the initialization of the credits canbe zero, whereupon after a reset credits can be passed from the targetNoC element to the requesting NoC element. Depending on the desiredimplementation, a certain number of credits can be provided at themaster. However, if the reset for the NoC elements are unknown, the flowis harder to control, the valid-ready handshake can be utilized with theready allowed for de-assertion. Even though the master element has VCcredits, the master may be unable to transmit until the target NoC(slave) element is ready to accept the credits.

In example implementations, different virtual channels may involvedifferent responses (e.g. read response, write response). In exampleimplementations, there can be multiple virtual channels on the readinterface going into another controller having only one read responsechannel. Thus, the congestion may go to the memory controller undergoingdifferent arbitrations with a guaranteed drain. Each channel iscompletely independent, and they can be used for any purpose accordingto the desired implementation.

Example implementations involve a bookkeeping mechanism to trackresponses. Such a mechanism can involve a data structure to storeinformation to track responses and when the responses are received. Forexample, if there are four VCs, the VCs can be broken into four segmentswith reservations. The arbiter may determine to send a flit if the NoCelement has credit at the output. The example implementations caninvolve any partition of the data structure between the four VCs in anyway according to the desired implementation. For example, each hardwareelement can be dedicated to a single VC, or pools of resources can beshared with some or all of the VCs. In example implementations, a mix ofdedicated and shared resources can also be provided. Dedicated resourcescan ensure one channel cannot block another channel.

FIG. 7(a) illustrates an example system having a SoC element (master)701, a SoC element (slave) 705, NoC bridges 702, 704 and a NoC 703, inaccordance with an example implementation. In the exampleimplementation, the NoC bridges 702, 704 and the NoC elements inside theNoC 703 have four input VCs and four output VCs. A single physical wireproceeds from the SoC element to the bridge, whereupon the signal isfanned out to each NoC element in four output VCs.

FIG. 7(b) illustrates an example architecture for a NoC element, inaccordance with an example implementation. In the exampleimplementation, the NoC element has four input VCs and four output VCs.In the example of FIG. 7(b), there is a decoder 711 for the input VCs, aqueue 712, an arbiter 713, a multiplexer 714, and an output 715facilitating output to four output VCs. The single bus feeds into thedecoder 711, which receives the input and fans out the input to fourindividual queues 712. When a VC credit is received, the arbiter 713pops a flit off of the queue 712 and send the flit to the multiplexer714 to be transmitted through the corresponding output VC 715.

As illustrated in FIGS. 7(a) and 7(b), example implementations mayinvolve a NoC that can involve a plurality of channels (e.g., physicalchannels, virtual channels and/or virtual channels disposed within thephysical channels) and NoC hardware elements. Such NoC hardware elementscan involve at least one receiving hardware element (e.g., target NoCelement 602) and at least one transmitting hardware element (e.g.,requesting NoC element 601) as illustrated in FIG. 6. When atransmitting hardware element is to transmit a message, the protocol asillustrated in FIG. 6 can be followed wherein the hardware elementtransmits a valid signal to the at least one receiving hardware elementon a channel of the plurality of channels, and transmits a virtualchannel (VC) valid signal on a virtual channel of the plurality ofchannels to the at least one receiving hardware element. The receivinghardware element is configured to transmit a VC credit to the at leastone transmitting hardware element over the virtual channel of theplurality of channels as illustrated in FIG. 6.

Depending on the desired implementation, the transmitting hardwareelement can be configured to not transmit the VC valid signal on thevirtual channel until a VC credit is obtained, and transmit the VC validsignal on the virtual channel to the at least one receiving hardwareelement on receipt of the VC credit based on the protocol of FIG. 6. Inexample implementations, the transmitting hardware element can issue awrite request when the transmitter determines that the receiving NoChardware element has enough buffer size for the address information andthe storage of data. The transmitting NoC hardware element can infersuch information based on the default storage (e.g., 64B) which can beprogrammable or definable depending on the desired implementation.

In an example implementation, the plurality of channels can also involvevirtual channels, with each of the physical channels being configurableto be independently controlled to adjust a number of VCs for each of theplurality of channels. Such implementations can be conducted by a NoCcontroller which is configured to define the number of VCs for a givenphysical channel. In an example implementation, the NoC may maintain thesame quantity of VCs for read messages as for read response messageswithin a given physical channel through such a NoC controller, or theycan be differing quantities depending on the desired implementation.

In example implementations, the NoC may include a configurable interfacefor the transmitting hardware element and the receiving hardwareelement, that configures the transmitting hardware element and thereceiving hardware element for at least one of deadlock avoidance andquantity of virtual channels. Such configuration can be conductedthrough a NoC specification, wherein the interface can be in the form ofa hardware/software interface or a hardware mechanism that processes thespecification to configure the NoC for deadlock avoidance, and quantityof virtual channels.

In example implementations, the NoC may also include a virtual interfacefor virtual channels to interact with agents of a SoC. Such a virtualinterface can be implemented in the NoC bridges, or can be part of theNoC depending on the desired implementation.

In example implementations, the transmitting element can be configuredto manage VC credits received from one or more receiving hardwareelements as illustrated in FIG. 8, and conduct arbitration based onwhether a message destination is associated with a VC credit from themanaged VC credits. The hardware elements can be configured to conductinformed arbitration, as each hardware element knows whether a potentialoutput VC has an associated credit or not based on the informationmanaged as illustrated in FIG. 8.

In further example implementations, the receiving hardware element canbe configured to provide a reservation for a VC to one or moretransmitting hardware elements based on at least one of management ofdedicated VC credits to the one or more of transmitting hardwareelements, a shared tool providing certain minimum priority for the oneor more transmitting hardware elements, and an inference of priorityfrom the one or more of the at least one transmitting hardware element.Such reservations can include a pre-configuration so that certainhardware elements always have a certain number of VC credits reserved,priority inferred based on the type of message received or a hierarchyof hardware elements as defined in the NoC specification.

FIG. 8 illustrates an example table view of information utilized by theNoC element, in accordance with an example implementation. In exampleimplementations, NoC elements may include a bookkeeping mechanism toindicate the status of the target VCs. In the example of FIG. 8, eachoutput VC is associated with a ready signal, and VC credit. When readyand valid are set, then a transfer can take place. VC credit indicatesthe number of credits available for transmission to the output VC. VCcredit is incremented when a credit signal is received, and decrementedwhen a credit is utilized.

FIG. 9 illustrates a flow diagram for a requesting NoC element, inaccordance with an example implementation. At 901 the requesting NoCelement waits until a VC credit is received before transmitting arequest. At 902, once a VC credit is received, the requesting NoCelement conducts arbitration among available traffic that are associatedwith credits, and forwards the data packet to the output interface. At903, the valid/ready handshake as illustrated in FIG. 6 is conducted,wherein a VC valid signal is provided to indicate the VC that the datawill be sent through and the data/flit is sent through the correspondingVC with the VC valid signal. The VC credit counter is decremented. Therequesting NoC element will also wait for additional VC credits asnecessary. At 904, the receiving element receives the data/flit from thetransmitting element.

In example implementations there can be a system such as a NoC, a SoC,or any hardware element system that require a virtual channel interfacethat involves a plurality of channels; at least one receiving hardwareelement; and at least one transmitting hardware element configured to:transmit a valid signal to the at least one receiving hardware elementon a channel of the plurality of channels, and transmit a virtualchannel (VC) valid signal as a virtual channel indicator for a virtualchannel of a plurality of virtual channels designated for transmissionof data and transmit the data on the virtual channel designated for thetransmission of the data; wherein the at least one receiving hardwareelement is configured to transmit a VC credit to the at least onetransmitting hardware element as illustrated in FIG. 6 and FIG. 9.

In example implementations, the at least one transmitting hardwareelement is configured to not transmit the data packet on the virtualchannel until a VC credit is obtained. The plurality of channels can bephysical channels that are partitioned into one or more virtualchannels, and each of the channels can be configurable to beindependently controlled for mapping to an interface virtual VCs. Insuch example implementations, multiple transmitting channels can map toa single interface virtual channel, or a single transmitting channel canmap to multiple virtual channels depending on the desiredimplementation. In an example implementation involving a singletransmitting channel mapping to multiple virtual channels, thetransmission can be conducted when any of the VC credits are available.The mapping can be done through a virtual interface connected to the NoCto map virtual channels with transmitting elements such as agents of aSoC. Such interfaces can include read channels, read response channels,and so on depending on the desired implementation. In exampleimplementations, the interface can include the decoder, queue, arbiter,multiplexer, and/or the output as illustrated in FIG. 7(b).

In example implementations, the at least one transmitting element isfurther configured to manage VC credits received from one or more of theat least one receiving hardware element; and conduct arbitration basedon whether a message destination is associated with a VC credit from themanaged VC credits as illustrated in FIG. 8. The management can be donethrough the interface of the hardware element that is configured to maptransmitting channels to virtual channels.

In example implementations, the at least one transmitting hardwareelement is configured to arbitrate messages for transmitting throughprioritizing messages that are associated with a VC credit through theuser of the arbiter as illustrated in FIG. 7(b).

In example implementations, the at least one receiving hardware elementis configured to provide a reservation for a VC to one or more of the atleast one transmitting hardware element based on at least one ofmanagement of dedicated VC credits to the one or more of the at leastone transmitting hardware element, and an inference of priority from theone or more of the at least one transmitting hardware element based onthe information of FIG. 8. Priority can be inferred based on the type ofmessage and the hierarchy set according to the desired implementation(e.g., hierarchy for read, read response, write, etc.).

In example implementations the at least one receiving hardware elementcan be a NoC element such as a router or a bridge and the at least onetransmitting hardware element is an agent of the System on Chip (SoC),such as a memory or a CPU.

Although example implementations involve a NoC, other systems such as aSoC or other interconnect can be utilized in accordance with the desiredimplementation. Any hardware element that can utilize a virtualinterface can take advantage of the example implementations describedherein.

Unless specifically stated otherwise, as apparent from the discussion,it is appreciated that throughout the description, discussions utilizingterms such as “processing,” “computing,” “calculating,” “determining,”“displaying,” or the like, can include the actions and processes of acomputer system or other information processing device that manipulatesand transforms data represented as physical (electronic) quantitieswithin the computer system's registers and memories into other datasimilarly represented as physical quantities within the computersystem's memories or registers or other information storage,transmission or display devices.

Example implementations may also relate to an apparatus for performingthe operations herein. This apparatus may be specially constructed forthe required purposes, or it may include one or more general-purposecomputers selectively activated or reconfigured by one or more computerprograms. Such computer programs may be stored in a computer readablemedium, such as a computer-readable storage medium or acomputer-readable signal medium. A computer-readable storage medium mayinvolve tangible mediums such as, but not limited to optical disks,magnetic disks, read-only memories, random access memories, solid statedevices and drives, or any other types of tangible or non-transitorymedia suitable for storing electronic information. A computer readablesignal medium may include mediums such as carrier waves. The algorithmsand displays presented herein are not inherently related to anyparticular computer or other apparatus. Computer programs can involvepure software implementations that involve instructions that perform theoperations of the desired implementation.

Various general-purpose systems may be used with programs and modules inaccordance with the examples herein, or it may prove convenient toconstruct a more specialized apparatus to perform desired method steps.In addition, the example implementations are not described withreference to any particular programming language. It will be appreciatedthat a variety of programming languages may be used to implement theteachings of the example implementations as described herein. Theinstructions of the programming language(s) may be executed by one ormore processing devices, e.g., central processing units (CPUs),processors, or controllers.

As is known in the art, the operations described above can be performedby hardware, software, or some combination of software and hardware.Various aspects of the example implementations may be implemented usingcircuits and logic devices (hardware), while other aspects may beimplemented using instructions stored on a machine-readable medium(software), which if executed by a processor, would cause the processorto perform a method to carry out implementations of the presentdisclosure. Further, some example implementations of the presentdisclosure may be performed solely in hardware, whereas other exampleimplementations may be performed solely in software. Moreover, thevarious functions described can be performed in a single unit, or can bespread across a number of components in any number of ways. Whenperformed by software, the methods may be executed by a processor, suchas a general purpose computer, based on instructions stored on acomputer-readable medium. If desired, the instructions can be stored onthe medium in a compressed and/or encrypted format.

Moreover, other implementations of the present disclosure will beapparent to those skilled in the art from consideration of thespecification and practice of the teachings of the present disclosure.Various aspects and/or components of the described exampleimplementations may be used singly or in any combination. It is intendedthat the specification and example implementations be considered asexamples only, with the true scope and spirit of the present disclosurebeing indicated by the following claims.

What is claimed is:
 1. A hardware element incorporated into a Network onChip (NoC), comprising: a plurality of physical links and virtual links;a queue for transmission of output messages to output ports of thehardware element; an arbiter configured to process input messages to thequeue based on a logic scheme; a configurable bypass link between thevirtual links, and bypass logic configured to redirect the inputmessages to the configurable bypass link to bypass the queue and thearbiter.
 2. The hardware element of claim 1, wherein the bypass logic isconfigured to redirect the input messages to the configurable bypasslink opportunistically.
 3. A hardware element incorporated into a Systemon Chip (SoC), comprising: a plurality of physical links and virtuallinks; a queue for transmission of output messages to output ports ofthe hardware element; an arbiter configured to process input messages tothe queue based on a logic scheme; a configurable bypass link betweenthe virtual links, and bypass logic configured to redirect the inputmessages to the configurable bypass link to bypass the queue and thearbiter.
 4. The hardware element of claim 3, wherein the bypass logic isconfigured to redirect the input messages to the configurable bypasslink opportunistically.